Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 11 de 11
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
Artigo em Inglês | MEDLINE | ID: mdl-38356218

RESUMO

A key challenge for machine intelligence is to learn new visual concepts without forgetting the previously acquired knowledge. Continual learning (CL) is aimed toward addressing this challenge. However, there still exists a gap between CL and human learning. In particular, humans are able to continually learn from the samples associated with known or unknown labels in their daily lives, whereas existing CL and semi-supervised CL (SSCL) methods assume that the training samples are associated with known labels. Specifically, we are interested in two questions: 1) how to utilize unrelated unlabeled data for the SSCL task and 2) how unlabeled data affect learning and catastrophic forgetting in the CL task. To explore these issues, we formulate a new SSCL method, which can be generically applied to existing CL models. Furthermore, we propose a novel gradient learner to learn from labeled data to predict gradients on unlabeled data. In this way, the unlabeled data can fit into the supervised CL framework. We extensively evaluate the proposed method on mainstream CL methods, adversarial CL (ACL), and semi-supervised learning (SSL) tasks. The proposed method achieves state-of-the-art performance on classification accuracy and backward transfer (BWT) in the CL setting while achieving the desired performance on classification accuracy in the SSL setting. This implies that the unlabeled images can enhance the generalizability of CL models on the predictive ability of unseen data and significantly alleviate catastrophic forgetting. The code is available at https://github.com/luoyan407/grad_prediction.git.

2.
Artigo em Inglês | MEDLINE | ID: mdl-38289843

RESUMO

The conventional surface electromyography (sEMG)-based gesture recognition systems exhibit impressive performance in controlled laboratory settings. As most systems are trained in a closed-set setting, the systems's performance may see significant deterioration when novel gestures are presented as imposter. In addition, the state-of-the-art generative and discriminative methods have achieved considerable performance on high-density sEMG signals. This can be seen as an unrealistic setting as the real-world muscle computer interface are mainly comprised of sparse multichannel sEMG signals. In this work, we propose a novel variational autoencoder based approach for open-set gesture recognition based on sparse multichannel sEMG signals. Using the predefined corresponding latent conditional distribution of known gestures, the conditional Gaussian distribution of each known gesture is learned. Those samples with low probability density are identified as unknown gestures. The sEMG signals of known gestures are classified using the Kullback-Leibler divergences between the predefined prior distributions and input samples. The proposed approach is evaluated using three benchmark sparse multichannel sEMG databases. The experimental results demonstrate that our approach outperforms the existing open-set sEMG-based gesture recognition approaches and achieves a better trade-off between classifying known gestures and rejecting unknown gestures.


Assuntos
Gestos , Reconhecimento Psicológico , Humanos , Eletromiografia/métodos , Algoritmos , Mãos/fisiologia
3.
Bioengineering (Basel) ; 10(9)2023 Sep 20.
Artigo em Inglês | MEDLINE | ID: mdl-37760203

RESUMO

To enhance the performance of surface electromyography (sEMG)-based gesture recognition, we propose a novel network-agnostic two-stage training scheme, called sEMGPoseMIM, that produces trial-invariant representations to be aligned with corresponding hand movements via cross-modal knowledge distillation. In the first stage, an sEMG encoder is trained via cross-trial mutual information maximization using the sEMG sequences sampled from the same time step but different trials in a contrastive learning manner. In the second stage, the learned sEMG encoder is fine-tuned with the supervision of gesture and hand movements in a knowledge-distillation manner. In addition, we propose a novel network called sEMGXCM as the sEMG encoder. Comprehensive experiments on seven sparse multichannel sEMG databases are conducted to demonstrate the effectiveness of the training scheme sEMGPoseMIM and the network sEMGXCM, which achieves an average improvement of +1.3% on the sparse multichannel sEMG databases compared to the existing methods. Furthermore, the comparison between training sEMGXCM and other existing networks from scratch shows that sEMGXCM outperforms the others by an average of +1.5%.

4.
IEEE Trans Pattern Anal Mach Intell ; 45(1): 525-538, 2023 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-35130150

RESUMO

Motivated by scenarios where data is used for diverse prediction tasks, we study whether fair representation can be used to guarantee fairness for unknown tasks and for multiple fairness notions. We consider seven group fairness notions that cover the concepts of independence, separation, and calibration. Against the backdrop of the fairness impossibility results, we explore approximate fairness. We prove that, although fair representation might not guarantee fairness for all prediction tasks, it does guarantee fairness for an important subset of tasks-the tasks for which the representation is discriminative. Specifically, all seven group fairness notions are linearly controlled by fairness and discriminativeness of the representation. When an incompatibility exists between different fairness notions, fair and discriminative representation hits the sweet spot that approximately satisfies all notions. Motivated by our theoretical findings, we propose to learn both fair and discriminative representations using pretext loss which self-supervises learning, and Maximum Mean Discrepancy as a fair regularizer. Experiments on tabular, image, and face datasets show that using the learned representation, downstream predictions that we are unaware of when learning the representation indeed become fairer. The fairness guarantees computed from our theoretical results are all valid.

5.
IEEE Trans Pattern Anal Mach Intell ; 43(6): 1928-1946, 2021 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-31902755

RESUMO

One of the well-known challenges in computer vision tasks is the visual diversity of images, which could result in an agreement or disagreement between the learned knowledge and the visual content exhibited by the current observation. In this work, we first define such an agreement in a concepts learning process as congruency. Formally, given a particular task and sufficiently large dataset, the congruency issue occurs in the learning process whereby the task-specific semantics in the training data are highly varying. We propose a Direction Concentration Learning (DCL) method to improve congruency in the learning process, where enhancing congruency influences the convergence path to be less circuitous. The experimental results show that the proposed DCL method generalizes to state-of-the-art models and optimizers, as well as improves the performances of saliency prediction task, continual learning task, and classification task. Moreover, it helps mitigate the catastrophic forgetting problem in the continual learning task. The code is publicly available at https://github.com/luoyan407/congruency.

6.
IEEE Trans Neural Netw Learn Syst ; 31(2): 685-699, 2020 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-31094695

RESUMO

Intraclass compactness and interclass separability are crucial indicators to measure the effectiveness of a model to produce discriminative features, where intraclass compactness indicates how close the features with the same label are to each other and interclass separability indicates how far away the features with different labels are. In this paper, we investigate intraclass compactness and interclass separability of features learned by convolutional networks and propose a Gaussian-based softmax ( G -softmax) function that can effectively improve intraclass compactness and interclass separability. The proposed function is simple to implement and can easily replace the softmax function. We evaluate the proposed G -softmax function on classification data sets (i.e., CIFAR-10, CIFAR-100, and Tiny ImageNet) and on multilabel classification data sets (i.e., MS COCO and NUS-WIDE). The experimental results show that the proposed G -softmax function improves the state-of-the-art models across all evaluated data sets. In addition, the analysis of the intraclass compactness and interclass separability demonstrates the advantages of the proposed function over the softmax function, which is consistent with the performance improvement. More importantly, we observe that high intraclass compactness and interclass separability are linearly correlated with average precision on MS COCO and NUS-WIDE. This implies that the improvement of intraclass compactness and interclass separability would lead to the improvement of average precision.

7.
IEEE Trans Image Process ; 29: 237-249, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-31369377

RESUMO

Unsupervised video object segmentation aims to automatically segment moving objects over an unconstrained video without any user annotation. So far, only few unsupervised online methods have been reported in the literature, and their performance is still far from satisfactory because the complementary information from future frames cannot be processed under online setting. To solve this challenging problem, in this paper, we propose a novel unsupervised online video object segmentation (UOVOS) framework by construing the motion property to mean moving in concurrence with a generic object for segmented regions. By incorporating the salient motion detection and the object proposal, a pixel-wise fusion strategy is developed to effectively remove detection noises, such as dynamic background and stationary objects. Furthermore, by leveraging the obtained segmentation from immediately preceding frames, a forward propagation algorithm is employed to deal with unreliable motion detection and object proposals. Experimental results on several benchmark datasets demonstrate the efficacy of the proposed method. Compared to state-of-the-art unsupervised online segmentation algorithms, the proposed method achieves an absolute gain of 6.2%. Moreover, our method achieves better performance than the best unsupervised offline algorithm on the DAVIS-2016 benchmark dataset. Our code is available on the project website: https://www.github.com/visiontao/uovos.

8.
PLoS One ; 14(9): e0221390, 2019.
Artigo em Inglês | MEDLINE | ID: mdl-31513592

RESUMO

Sensor-based human activity recognition aims at detecting various physical activities performed by people with ubiquitous sensors. Different from existing deep learning-based method which mainly extracting black-box features from the raw sensor data, we propose a hierarchical multi-view aggregation network based on multi-view feature spaces. Specifically, we first construct various views of feature spaces for each individual sensor in terms of white-box features and black-box features. Then our model learns a unified representation for multi-view features by aggregating views in a hierarchical context from the aspect of feature level, position level and modality level. We design three aggregation modules corresponding to each level aggregation respectively. Based on the idea of non-local operation and attention, our fusion method is able to capture the correlation between features and leverage the relationship across different sensor position and modality. We comprehensively evaluate our method on 12 human activity benchmark datasets and the resulting accuracy outperforms the state-of-the-art approaches.


Assuntos
Atividades Humanas , Reconhecimento Automatizado de Padrão/métodos , Algoritmos , Benchmarking , Humanos , Redes Neurais de Computação , Reconhecimento Psicológico
9.
IEEE Trans Biomed Eng ; 66(10): 2964-2973, 2019 10.
Artigo em Inglês | MEDLINE | ID: mdl-30762526

RESUMO

Gesture recognition using sparse multichannel surface electromyography (sEMG) is a challenging problem, and the solutions are far from optimal from the point of view of muscle-computer interface. In this paper, we address this problem from the context of multi-view deep learning. A novel multi-view convolutional neural network (CNN) framework is proposed by combining classical sEMG feature sets with a CNN-based deep learning model. The framework consists of two parts. In the first part, multi-view representations of sEMG are modeled in parallel by a multistream CNN, and a performance-based view construction strategy is proposed to choose the most discriminative views from classical feature sets for sEMG-based gesture recognition. In the second part, the learned multi-view deep features are fused through a view aggregation network composed of early and late fusion subnetworks, taking advantage of both early and late fusion of learned multi-view deep features. Evaluations on 11 sparse multichannel sEMG databases as well as five databases with both sEMG and inertial measurement unit data demonstrate that our multi-view framework outperforms single-view methods on both unimodal and multimodal sEMG data streams.


Assuntos
Aprendizado Profundo , Eletromiografia/métodos , Gestos , Interface Usuário-Computador , Conjuntos de Dados como Assunto , Humanos
10.
PLoS One ; 13(10): e0206049, 2018.
Artigo em Inglês | MEDLINE | ID: mdl-30376567

RESUMO

The surface electromyography (sEMG)-based gesture recognition with deep learning approach plays an increasingly important role in human-computer interaction. Existing deep learning architectures are mainly based on Convolutional Neural Network (CNN) architecture which captures spatial information of electromyogram signal. Motivated by the sequential nature of electromyogram signal, we propose an attention-based hybrid CNN and RNN (CNN-RNN) architecture to better capture temporal properties of electromyogram signal for gesture recognition problem. Moreover, we present a new sEMG image representation method based on a traditional feature vector which enables deep learning architectures to extract implicit correlations between different channels for sparse multi-channel electromyogram signal. Extensive experiments on five sEMG benchmark databases show that the proposed method outperforms all reported state-of-the-art methods on both sparse multi-channel and high-density sEMG databases. To compare with the existing works, we set the window length to 200ms for NinaProDB1 and NinaProDB2, and 150ms for BioPatRec sub-database, CapgMyo sub-database, and csl-hdemg databases. The recognition accuracies of the aforementioned benchmark databases are 87.0%, 82.2%, 94.1%, 99.7% and 94.5%, which are 9.2%, 3.5%, 1.2%, 0.2% and 5.2% higher than the state-of-the-art performance, respectively.


Assuntos
Algoritmos , Atenção/fisiologia , Eletromiografia , Gestos , Redes Neurais de Computação , Reconhecimento Automatizado de Padrão , Bases de Dados como Assunto , Humanos , Processamento de Imagem Assistida por Computador , Processamento de Sinais Assistido por Computador , Fatores de Tempo
11.
IEEE Trans Pattern Anal Mach Intell ; 37(10): 2057-70, 2015 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-26340257

RESUMO

A significant body of literature on saliency modeling predicts where humans look in a single image or video. Besides the scientific goal of understanding how information is fused from multiple visual sources to identify regions of interest in a holistic manner, there are tremendous engineering applications of multi-camera saliency due to the widespread of cameras. This paper proposes a principled framework to smoothly integrate visual information from multiple views to a global scene map, and to employ a saliency algorithm incorporating high-level features to identify the most important regions by fusing visual information. The proposed method has the following key distinguishing features compared with its counterparts: (1) the proposed saliency detection is global (salient regions from one local view may not be important in a global context), (2) it does not require special ways for camera deployment or overlapping field of view, and (3) the key saliency algorithm is effective in highlighting interesting object regions though not a single detector is used. Experiments on several data sets confirm the effectiveness of the proposed principled framework.


Assuntos
Algoritmos , Processamento de Imagem Assistida por Computador/métodos , Aprendizado de Máquina , Reconhecimento Automatizado de Padrão/métodos , Comportamento/fisiologia , Humanos
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...